Defining Reference Sequences for Nocardia Species by Similarity and Clustering Analyses of 16S rRNA Gene Sequence Data
نویسندگان
چکیده
BACKGROUND The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. METHODS A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM) of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization. RESULTS The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52%) corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as 'centroids' in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578. CONCLUSION The identification of centroids of 16S rRNA gene sequence clusters using novel distance matrix clustering enables the identification of the most representative sequences for each individual species of Nocardia and allows the quantitation of inter- and intra-species variability.
منابع مشابه
Analysis of secA1 gene sequences for identification of Nocardia species.
Molecular methodologies, especially 16S rRNA gene sequence analysis, have allowed the recognition of many new species of Nocardia and to date have been the most precise methods for identifying isolates reliably to the species level. We describe here a novel method for identifying Nocardia isolates by using sequence analysis of a portion of the secA1 gene. A region of the secA1 gene of 30 type o...
متن کاملMolecular Detection of Novel Genetic Variants Associated to Anaplasma ovis among Dromedary Camels in Iran
To the best of our knowledge, little information is available regarding the presence of Anaplasma species in camels in Iran. This study sought to investigate the presence of Anaplasma species by microscopy and polymerase chain reaction (PCR) assays in 100 healthy dromedaries (Camelus dromedarius) arriving for slaughter. The microscopic examination of Giemsa-stained blood films revealed that Ana...
متن کاملGenetic variations of avian Pasteurella multocida as demonstrated by 16S-23S rRNA gene sequences comparison
Pasteurella multocida is known as an important heterogenic bacterial agent causes some severe diseases such as fowl cholera in poultry and haemorrhagic septicaemia in cattle and buffalo. A polymerase chain reaction (PCR) assay was developed using primers derived from conserved part of 16S-23S rRNA gene. The PCR amplified a fragment size of 0.7 kb using DNA from nine avian P. multocida isolates...
متن کاملMultiple copies of the 16S rRNA gene in Nocardia nova isolates and implications for sequence-based identification procedures.
Molecular investigation of two Nocardia patient isolates showed unusual restriction fragment length polymorphism patterns with restriction endonuclease assays (REA) using an amplified portion of the 16S rRNA gene. Patterns typical of Nocardia nova were obtained with REA of an amplified portion of the 65-kDa heat shock protein gene. Subsequent sequence analysis of the 16S rRNA gene regions of th...
متن کاملMolecular characterization of Mycoplasma synoviae isolates from commercial chickens in Iran
Detection of Mycoplasma synoviae (MS) by culture and polymerase chain reaction (PCR) has been reported from commercial chicken farms in different provinces of Iran. In some reports the phylogenetic analysis of MS isolates based on 16S rRNA and variable lipoprotein hemagglutinin (vlhA) genes have been carried out. The PCR product containing partial 16S rRNA genes of Iranain isolates was sequence...
متن کامل